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I. INTRODUCTION 

The best evidence for the accelerated expansion of the 
universe still comes, after 15 years from the earliest re- 
sults [1, 2], from the supernovae la (SNIa). There are 
now several hundreds SNIa useful for cosmological pur- 
poses, ranging in distance up to z « 1.7. The SNIa have 
been compiled in different datasets [3-8], taking into ac- 
count different sets of possible systematics and making 
use of two different approaches to standardize these pri- 
mordial candles [9, 10]. Nevertheless, every analysis per- 
formed on these datasets confirms that a present cosmic 
acceleration explains satisfactorily the data. The same 
conclusion is now supported also by several other lines of 
evidence, such as measurements of the Baryonic Acous- 
tic Oscillations (BAO) [11, 12], of the anisotropics of the 
Cosmic Background Radiation (CMB) [13] and of the age 
of the oldest stars known [14-16]. 

The recent increase in the number of observed super- 
novae is also driving a huge effort to understand and 
control possible sources of systematics that may under- 
mine the progress in the cosmological interpretation. A 
recent analysis [8] claims indeed the systematic uncer- 
tainties are already larger than the statistical ones and 
the issue will be much more important in the near future 
as we forecast an increase in the number of observed su- 
pernovae by 1 order of magnitude in the next ~ 5 years 
(for example, with the Dark Energy Survey [17]) and by 2 
orders of magnitude in the next ~ 15 (for example, with 
the Large Synaptic Survey Telescope [18]). It is thus 
necessary to continue investigating the SNIa datasets in 
search of such systematic effects and of additional cos- 
mological information. On one side, in fact, we are al- 
ready aware of many effects that could come into play 
to alter the SNIa apparent magnitude: contamination 
from non-la supernovae, dust absorption in both host 
galaxy and Milky Way, gravitational lensing distortions, 
local velocity flows et cetera; not to count systematics 
which arise from selection effects [19-21] and from rely- 
ing solely on photometry (typically in just a few band- 



passes) and the flux reference of such filters (for a review, 
see for instance [22]). On the other side, non standard 
cosmological models might affect our parameter estima- 
tion: for instance, any anisotropy in the expansion rate 
would show up as an anisotropy in the SNIa cosmological 
parameters. There have been of course many searches for 
such systematic biases. All of them, however, assume a 
specific effect (say, gravitational lensing [23] or cosmo- 
logical anisotropy [24-26]) and test whether this effect is 
enough to make some SNIa incompatible with the oth- 
ers. In other words, one proceeds by testing a specific 
prejudice. 

In this paper we propose an alternative approach. We 
wish to perform a systematic search of biases without 
having any preferred selection criteria. In other words, 
we try to answer the following question: is there any 
subset of SNIa that is statistically incompatible with the 
others? That is, is there a subset of SNIa that could 
be described by parameters which are incompatible with 
those that describe the other SNIa? In a sense, this is a 
direct generalization of the search for outliers. Instead of 
searching for single outliers, i.e. SNIa that appear statisti- 
cally incompatible with the others (say, some parameters 
that describe their light curves are too far off from the 
others or their distance moduli just end up very far from 
the overall Hubble diagram), we search for subsets of, 
say, dozens of SNIa at once whose parameters are incom- 
patible with the others. As will be shown in Section II B, 
the proposed generalization reduces to the standard out- 
lier search in the limit in which the whole data is divided 
into two complementary datasets, one of which contains 
a single element. 

The standard tool to compare whether a particu- 
lar dataset is compatible with a proposed model is 
the goodness-of-fit test, which gives well defined prob- 
ability statements about such agreement. However, the 
information obtained is limited, and this simple analysis 
may hide problems in both data and model. For instance, 
if a given model parameter affects only a small fraction 
of the data, even if such parameter turns out to disagree 
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with this fraction the goodness-of-fit may still claim an 
overall good fit. To address this issue, a modification 
of the test, dubbed parameter goodness-of-fit, was pro- 
posed in [27] and later extended in [28]. The parameter 
goodness-of-fit method nevertheless still relies on com- 
parisons of x 2 values which are only sensitive to the local 
minimum, and not to the entire likelihood. Here, in- 
stead, we will adopt a fully Bayesian approach so as to use 
all the information available, e.g. a possible overlapping 
of the likelihoods surfaces. We dub internal robustness 
the fundamental quantity evaluated in this method. The 
name is motivated by the analogous quantity robustness, 
which was recently defined in a context in which the two 
datasets refer to different observational probes [29] (and 
which we will henceforth refer to as external robustness, 
for differentiation). In particular, Ref. [29] showed that 
the robustness is an estimator "orthogonal" to the Fig- 
ure of Merit, which is sensitive to the relative orientation 
of the two probes but not to the distance between the 
two-experiment confidence regions. 

This paper is organized as follows. In Section II we 
will introduce the formalism of the internal robustness. 
In Section III we will describe how we will systemati- 
cally search for bias in SNe data. In Section IV we will 
show the results relative to a biased test catalogue, the 
Union2.1 catalogue augmented with the supernovae that 
did not pass the quality cuts, and the actual Union2.1 
dataset. As we will see, our method will be able to detect 
the systematic bias in the first two catalogues. Moreover, 
our analysis does not show signs of systematic effects in 
the actual Union2.1 catalogue. Finally, we will give our 
conclusions in Section V, and explain some technical de- 
tails in Appendix A. 

We will adopt the following notation. Bold face will 
distinguish a vector x or matrix A from their compo- 
nents Xi and Aij , a superscript t will denote a transposed 
vector or matrix, \A\ will represent the determinant of a 
matrix A, and a hat will indicate the best-fit value of the 
corresponding quantity. 



II. FORMALISM 

A. Bayesian evidence and its Fisher approximation 

Let us first of all recall some statistical definitions in 
the Bayesian context [30, 31]. The Bayesian evidence is 
defined as 



to write 



£{x;M) = I C(x;6 M )V(e M )d n e M 



(1) 



where x = (xi, X2, xjy) are N random data, 9 M — 
(6\,9 2 , ■■■,9 n ) are n theoretical parameters that describe 
the model M, C is the likelihood function, and V is the 
prior probability of the parameters 6 M . If 'P(M) is the 
prior on a particular model M , we can use Bayes' theorem 



£(M;x) = £{x;M) 



V(M) 



(2) 



i.e. the posterior probability £ of having model M given 
the data. We can finally use the latter equation to com- 
pare quantitatively two models taking the ratio of their 
probabilities (so that V(x) cancels out): 



£(Mi;aQ V{M X ) 
£{M 2 ;x) ® 12 T(M 2 )' 



(3) 



where we introduced the Bayes ratio (sometimes referred 
to as Bayes factor) 



Bl2 = 



£{x;M 2 ) 



(4) 



Often, however, one assumes that V(M\) = "P(M 2 ) and 
we adopt this choice here. A Bayes ratio Bi 2 > 1 (< 1) 
says that current data favors the model Mi (M 2 ). As we 
will sec in the next Section the Bayes ratio will be central 
in the definition of internal robustness. 

Suppose now the likelihood is gaussian in the data with 
covariance matrix X and expected means rrij. Then 



C = (27T)- Ar / 2 |S|- 1 /2 e -k 2 ; 

where the \ 2 is defined as: 

X 2 = (xi - rrbiY E^fa - rrij) . 
The best-fit (minimum) x 2 is then 

X 2 = {xi - fhiY T,T}{xj - fhj) , 



(5) 



(6) 



(7) 



where m, are the best-fit means. The maximum of the 
likelihood is then: 

£ max = (2 7 r)- JV / 2 |S|- 1 / 2 e-^, (8) 

so that we can rewrite Eq. (5) as 

^max ^ 2 ^ ^ ^ . (9) 



According to (1), the iV means m, depend on n parame- 
ters 6k, i.e. mi = riii{6). 1 The best-fit values fhi are then 
functions of the best fit estimators Ok, i.e. m, = mAOj. 
Here for simplicity we assume that S does not depend on 
the parameters, but this assumption can be easily lifted. 

Let us assume now that the likelihood can be approx- 
imated near by a Gaussian distributions also in the 
parameters, i.e. 



C(x;9) ~ f(x;0) = C a ^e~* { 



, (10) 



1 Note that to simplify the notation we will drop in the following 
equations the superscript M of the model in question. 
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where L^ in the exponential factor is the inverse of the 
covariance matrix of the likelihood (or Fisher matrix, see 
e.g. [31, 32]) and where now the data are inside the best- 
fit estimators 6i. Similarly, we assume a gaussian prior 
so that 



V(0k) = 



IP] 1 / 2 

(2tt)™/ 2 



(11) 



where Oi are the prior means and P is the prior matrix. 
It is now possible to evaluate the evidence analytically. 
Using the relation 



d n x e -h x±Ax + vtx — ( 27r )"^ 2 „WA~ x v 



|A|V2 



(12) 



e = f f(x;e)v(e)d n e 

|p|l/2 



one finds 



lax (27r)"/ 2 



exp 



- 6*) Lij(0j - 6j) 



d n e. 



£ =C n 



\ p \ 1/2 e -h{o-e)\ F -^{e-e) 



IFIV2 



(13) 



(14) 



where F = P + L . It will be convenient to rewrite the 
above equation as 



C 



PI 1 / 2 -^(^9 t L9 + 9 t P9-9' t F9' 



max \f\i/2 e 



(15) 



where 0' = F~ 1 (L6 + P6). In the limit in which the 
prior is very broad (so that Pjj <C L+j and therefore 
Fij — >• Lij) the argument of the exponential vanishes and 
we have simply 



£ ~ (2K)-»'*e-& |P|1/2 

^ - ^ 6 |£|l/2|S|l/2 



(16) 



B. Definition of internal robustness 

Eq. (1) and its Fisher approximation Eq. (15) allow 
to compute the evidence £ tot for a given cosmological 
model Mc and dataset dtot- Suppose now that the data 
are actually coming from two completely different, and 
therefore independent, distributions. In other words, as- 
sume that a subset d 2 of c? to t actually depends on a to- 
tally different set of parameters (say, the properties of 
the SNIa progenitors or galaxy environment), i.e., it is 
fully described by the systematic parameters of model 
Mg. Contrarily, the complementary set di = d to t — d 2 is 
still described by the cosmological parameters of model 



Mc ■ In this case the total evidence can be written as the 
product of the individual evidences: 



find — £ l £i 



(17) 



We can now use the Bayes ratio of Eq. (4) to quantify 
which hypothesis is favored. We thus compute 2 



<Stot,ind 

and define 



£{x- M c 



£tot 

And ~ £(x i; M c )£(x 2 ;M s ) ' 



R = logi3 t ot,ind 



(18) 



(19) 



as the internal robustness. As discussed in the Introduc- 
tion this quantity is related to the external robustness 
originally defined simply as robustness in [29]. 

The previous equations give the general definition of 
internal robustness. It is however useful to evaluate ana- 
lytically R in the Fisher approximation. Using Eq. (15) 
one finds: 



B, 



tot.ind 



\L 1 + P C \\L 2 



1/2 



c 



tot 
max 



X e 



L tot + P c \\Ps\ J C [2] 



\Lx\lL2 



1/2 



IS1IIS2 



1/2 



1 ^2 

~2 (.Xtot-Xl-X 2 ) 



■:-'() i 

where in the bottom line we simplified using Eq. (16), i.e., 
assuming the prior to be much broader than the likeli- 
hoods. Notice that the (2tt)~ n / 2 factors cancel out since 
JVtot = Ni+ N 2 , where iVj is the size of the subset dj. So 
far we have assumed gaussianity of the likelihoods, the 
existence of two independent distributions and the use of 
a very broad prior. If we make the additional assumption 
that the data points themselves are independent of one 
another (henceforth referred to as "raw data"), then the 
covariance matrix is diagonal, one has |Stot| = |^i||^2| 
and we get the final formula for the internal robustness 
in the Fisher approximation: 



R = R + -log 



(^^)4(xL-2i-xi), (21) 



where Rq is a constant coming from the unknown deter- 
minant of the systematic prior, Rq = — 1 log \ Ps\- 

The first factor in Eq. (21), formed out of the determi- 
nants, expresses Occam's razor factor of parameter vol- 
umes, while the second penalizes R if the two probes are 
very different from each other (so the hypothesis that 
they come from different models, or equivalently that 



2 It is straightforward to generalize Eq. (18) to more than two 
partitions such that £ in[ j = E\ £2 £3--- 
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systematics are important, is favored). We expect, there- 
fore, the internal robustness to be a measure of how much 
subsets of a dataset overlap: the more they do, the more 
compatible the two datasets are. 

In order to help intuition, we can evaluate R for the 
simplified case in which a) we can neglect the logarithmic 
part; b) there is only one parameter; c) the errors are all 
identical [a^ — a); and d) the subset d,2 consists of a 
single point xi- Then we have xi = and f° r -^1 — 
Ntot > 1 



i.e. R reduces to the scatter of X2 from the best fit fhtot 
evaluated by fitting the remaining N\ elements. There- 
fore a large and negative R means X2 is an outlier. 



of d,2 is constrained within a fixed realization of dtot , on 
which the iR-PDF depends. As a simple example of the 
difference between the latter two distributions, the iR- 
PDF has a compact and discrete support as it is sampled 
over a finite number of subsets. We will see in Section 
IV A 4, however, that for the datasets treated in this pa- 
per the binned iR-PDF (neglecting Occam's razor factor) 
is rather close to a \ 2 distribution with n to t degrees of 
freedom. 

The iR-PDF gives the probability that a given value of 
R is realized among the available subsets. However, dif- 
ferently from the eR-PDF, the iR-PDF cannot be used to 
assess the significance of a given value of R. To do so, we 
need to compute a distribution of iR-PDFs, which we will 
obtain by evaluating the robustness in many Montecarlo 
realizations of mock catalogues. 



C. Statistical properties of internal robustness 

In order to understand the statistical properties of the 
internal robustness let us start by fixing the subset c? 2 
to some d\ (d\ is just the complementary subset). Let 
us also assume that data come from the cosmological 
model Mc only. From Eq. (21) we can then calculate 
the probability distribution function of R(d^) (denoted 
as eR-PDF). If one also neglects the logarithmic term 
(Occam's razor factor) , then (in this very particular case) 
R becomes the parameter goodness-of-fit test [27], and it 
was shown [28] that R(d%) is distributed as a x 2 distri- 
bution with ntot degrees of freedom (d.o.f.). The full eR- 
PDF is then a modified \ 2 distribution with n to t d.o.f., 
not too distorted as Occam's razor factor is logarithmi- 
cally suppressed. If we now drop the assumption that 
data come from the cosmological model Mc, we can use 
the (fiducial) eR-PDF to assess the significance of a given 
value of i?(e?2), f° r example of a low value that could in- 
dicate that the dataset is systematics driven. We remind 
indeed that in our fully Bayesian context the robustness 
R is related to the Bayes ratio of the evidences and a 
small (large) R disfavors (favors) the description of the 
dataset by the cosmological model Mc alone. 

As far as the internal robustness is concerned, however, 
we do not intend to fix c?2 to a particular subset (even if 
we may still do so if useful) . The above way of proceed- 
ing is suited indeed to the external robustness, the aim 
of which is to analyze two different datasets (e.g. CMB 
and BAO). The idea behind the internal robustness is 
instead to keep the total dataset fixed (i.e. not to be con- 
cerned with the statistical distribution of R(dtot)) and 
evaluate R{d2) for all the possible partitions of dtot, thus 
generating a distribution which we call iR-PDF. Inter- 
nal and external robustness have therefore very different 
statistical properties. 

The fiducial iR-PDF (assuming that data come from 
the cosmological model Mc only) is a highly nontrivial 
object. Even if one neglects Occam's razor factor, it is 
not a x 2 distribution with n to t d.o.f., as the sampling 



D. Systematic parameters 

Generally speaking, there are two possible choices for 
the parameters for the subset ^2- In the first case, 
the parametrization is analog to the cosmological one, 
e.g. {ri m o, ^ao, a}, where f2 m o,^Ao are the present-day 
matter and dark-energy parameters and a is combination 
of the unknown magnitude offset M (sum of the SNe 
absolute magnitudes, of fc-corrections and other possi- 
ble systematics) and the Hubble parameter Hq: a = 
M + 25 - 51og 10 H [31]. In this case P s = P c This 
would be the preferred choice if we expect some of the 
SNe to be better described by different cosmological pa- 
rameters. For instance the parameters could differ in 
different line-of-sight directions (say, 3 different Hubble 
parameters H x0 , H y0 and H z0 , as in anisotropic mod- 
els with shear [26, 33]) or in different redshift shells (say, 
fij£ and f2™g as in some inhomogcncous models [34-36] ) . 
We could also erroneously interpret the unknown system- 
atic parameters as the cosmological ones and therefore 
employ the same parameter names and the same prior 
function. 

In the second case, not to be explored here, one could 
choose a completely phcnomcnological parametrization 
such as 

3 

m(z)=J2 x ifi( z ) (23) 

i=0 

for the theoretical magnitudes. Here the functions f(z) 
could be arbitrarily chosen, e.g. fi(z) — z % . If j = 2 
we have then m — Xq + \%z + X2Z 2 . Since the A's are 
linear parameters, both best fit and Fisher matrix would 
be analytical. This second choice would be appropriate if 
one expects some of the SNe to be dominated by system- 
atic effects unrelated to cosmology, say because of sample 
contamination or strong environmental effects. 

Restricting ourselves to the first case, we can marginal- 
ize over a analytically [31]. Let us define the sums (re- 
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member that the covariance matrix E is diagonal) 



S n — 



f M[ 



(24) 



where N' = N 12 , Mi 



51og 10 d L (zi), 



= d^H /10pc and d^ is the luminosity distance. 
Marginalizing over the constant offset a we obtain 



- log£ 
where 



1, S 
- log — 
2 B 2tt 



jv' 



J^logi 



27rer,- 



i=l 



x 2 

\ mar 



5? 
S 



(25) 



(26) 



As the likelihood is gaussian in the data, minimizing the 
marginalized Xmar with respect to {^ m o,^Ao} gives the 
same result as minimizing the original x 2 with respect 
to {f2 m o, ^ao, a}. Therefore, even though we are in a 
Bayesian context, Xmar is still distributed as a x 2 with 
N' — 3 degrees of freedom. Note also that if the original, 
non-marginalized data is independent, then so will be the 
data marginalized over a. To see this, one can rewrite 
the marginalized likelihood as 



Note that in the following results we will express the 
numerical values of the robustness as R — Rq — log(27r), 
to which we will refer simply as R. 

In Section IV we shall use supernova data provided 
by the Union2.1 [8] collaboration in the form of a 3- 
column matrix, each row {zi, Xi, cr.;} consisting of rcdshift 
Zi, distance modulus Xj and distance modulus error <7j. 
This matrix was computed using the SALT2 method [10]. 
Used in this way, the supernova data are rigorously not 
independent, as the values of the distance moduli are 
obtained after processing the raw data assuming a par- 
ticular cosmological model (see, e.g. [37]). As in this first 
work we mainly aim at presenting the method, we will 
ignore however such correlations. 3 Note, however, that if 
on one hand this is a caveat for the results that follow, 
on the other hand it presents an opportunity to cross- 
check the Union2.1 data (often naively employed in such 
concise form) for leftover systematics. 



III. SCANNING THE SUBSETS 

In order to obtain the full iR-PDF we have to compute 
Xi 2 and Li.2 for every partition di <2 of a given dataset 
of iVtot elements. There are 



C tx exp 



(27) 



where £ =S~ 1 — H~ 2 /Sq, which for a diagonal £ is 
still a diagonal matrix. 

Recalling that in general a Fisher matrix is given by 

d 2 log£ 



d0„d0 a 



(28) 



we get for the cosmological Fisher matrices L to t, L\ and 
L 2 the general expression 



l s 

^fi-i.nn 



ElVli 
a 



1 

So 

Mi 



(SiS 



1,P9 



Si 
So 



1- Si^Si^ 

E M^pM^g 
rr 2 



(29) 



S ^ 



1 



In 10 ^ a? 



V 



dLi,pdh 



l.q 
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In 10 





Li 



d-Li 



Mi 



Si 
So 



1 dn,p 



a Lj.q 



Finally, the robustness in the Fisher approximation is 
given in this case [see Eq. (25)] by: 



1 « 10 



,0-3iVtot 



(31) 



possible partitions and we find that in the present ap- 
plication a complete scan of all subsets is unfeasible for 
Ntot ^ 20. The issue then arises of which subset H 
among all possible partitions to form. We extract from 
the entire Union2.1 catalogue a number T of subsets d 2 
composed by a number N 2 of SNe between N 2>m i n and 
-^2, max chosen at random among all the possible combi- 
nations. However, a pure random sampling (i.e. uniform 
in the space of all possible partitions) would pick with ex- 
tremely high probability only the most populated subsets 
(in our case with N 2 ~ N2 >ma x)- Since we would like to 
explore also the smaller subsets, we adjust the selection 
so as to obtain a distribution approximately uniform in 
N 2 (i.e. approximately equal number of subsets for every 
value of N 2 ). We call this particular set 3(T), and the 
following analysis depends on it. In particular T gives 
the statistics of the analysis, while the definition of S 
determines the way the sets have been chosen. We will 
consider different strategies in forthcoming work. 

The upper limit we use is iV2 !max = N to t/2- This 
is due to the fact that, as we are using the cosmo- 
logical parametrization, the robustness is symmetric in 
the datasets d\ , d 2 so that scanning half of the cata- 
logue is enough. In order to discuss iVa.min it is impor- 
tant to stress that we consider a much larger parameter 



R = Rq 



2-log(2^) 



1 



■log 



So,iSq,2 \Li\\L 2 

Schtot l-^tot 



2 \A-tot 



(30) 



xi 



xi) 



3 It is worth stressing that the important factor is the indepen- 
dence of the data points, not of their error bars, which even if 
completely correlated would not affect the results. 
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Figure 1. Comparison between the exact robustness Re and 
its Fisher approximation Rf for a random set of subsets. Each 
subset is represented by a point, which is color coded accord- 
ing to the corresponding number N2 of SNe in the subset di- 
This analysis clearly shows that, as far as the robustness is 
concerned, the Fisher approximation holds better for larger 
subset sizes. See Section III for more details. 



space than the usual physical one, as f2 m o and £l\o also 
parametrize the (possibly cosmology unrelated) system- 
atic parameters. The range we adopt is —10 < £l m Q < 10 
and —20 < fl\o < 10, and we exclude the {f2 m o,f2Ao} 
region of the parameter space for which the expansion 
rate H(z) is negative for z < 2, which well accommo- 
dates the redshift range of the Union2.1 dataset. This 
is a relaxation of the usual no-big-bang excluded region, 
see Appendix A for more details. The value of N 2ymin 
is then found by demanding that the likelihood of the 
smaller subset d 2 has support within the parameter space 
considered. We have found empirically that a value of 
A^min = 10 satisfies on average this requirement. 

We will now explain how we actually computed the 
robustness. For subsets with N 2 larger than a certain 
A2.mcd we have found that the likelihood can be ade- 
quately represented by a Gaussian distribution in the pa- 
rameter space, so as to legitimate the Fisher approach. 
The advantage of using the Fisher matrix is in the com- 
putational speed gain, as only the maximum of the like- 
lihood and few derivatives have to be found numerically 
For smaller subsets N 2 < N 2mc< i, however, the likelihood 
deviates from a Gaussian distribution and we are forced 
to integrate the likelihood numerically over the full pa- 
rameter space. In order to empirically find A^.mcd we 
have computed both the exact robustness Re of Eq. (18) 
and its Fisher approximation Rp of Eq. (21) for a ran- 
dom set of subsets. Fig. 1 shows how the discrepancy 
decreases as the subset d 2 becomes larger. As we will 
see below in Fig. 4, the iR-PDF varies on a scale of order 
unity in robustness. Therefore, we want to keep the error 



in the robustness computation at a level < 0.1. We found 
that the value iV^med = 90 satisfies this requirement. 



IV. RESULTS 

In analyzing the Union2.1 catalogue of 580 SNe we will 
restrict to the case of the curved ACDM model, i.e., we 
allow for spatial curvature but fix the equation of state 
parameter (w = —1). The cosmological parameters are 
therefore the present-day density parameters £l m o and 
f^Ao, with the addition of the nuisance offset a. We will 
consider other parametcrizations in forthcoming work. 

Our results will be divided into three parts. First 
in Section IV A we will test our method with a mock 
dataset whose SNe were drawn from two very different 
cosmological models. In Section IV B we will analyze the 
Union2.1 dataset [8] including previously-excluded super- 
novae (i.e., SNIa that did not pass all the quality selection 
cuts); this also should be a test for our method. Finally, 
in Section IV C we will present our results regarding the 
actual Union2.1 dataset. 

Before dealing with our results, it is useful to define 
what we mean by "mock catalogue". A mock catalogue 
M. for a given dataset V is a synthetic unbiased dataset 
generated using the best-fit model of T> as the fiducial 
model. More precisely, we keep fixed the redshifts and 
errors Oi of T>, and change the distance moduli to £ m ock — 
Wfiduciai+ ^random where 2; random is drawn from a gaussian 
distribution of zero mean and standard deviation Cj. 



A. Systematics-driven dataset 

1. Dataset and iR-PDF computation 

In order to test our method we have generated a 
systematics-driven dataset 2?Eds i n the following way. 
First we have created a mock catalogue of Union2.1. 
Then we replaced 100 randomly chosen distance-modulus 
entries with others drawn taking the Einstein-de Sitter 
(EdS) model (flat, matter-dominated) as the fiducial one 
(instead of the best-fit model of Union2.1). These 100 
SNe are shown in red in Fig. 2. One expects for 2?EdS very 
low robustness values, as for the subset of 100 EdS SNe it 
is clearly favored the possibility that d\ and d 2 have inde- 
pendent cosmological parameters (which, we remind, we 
use to parametrize also the systematics) . In the case of 
d 2 being exactly the subset of 100 EdS SNe, one obtains 
the plot of Fig. 3 which shows the 1, 2 and 3<r confidence- 
level contours for dtot and d\ , d 2 independently. The con- 
tours are indeed far apart and the robustness is extremely 
low, i? m in — —97.5 (as previously mentioned, we know a 
posteriori that R is typically a 0(1) quantity). Fig. 3 
summarizes nicely the ultimate goal of internal robust- 
ness, that is, to go from the original set in green to the 
"decontaminated" set in red. The contours do not look 
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Figure 2. Plotted is the distance modulus of the SNe of 
the systematics-driven dataset £>EdS in which 100 SNe have 
been drawn from the EdS model as fiducial model (red data) 
and the remaining from the best-fit model of Union 2.1 (blue 
data). Without the different coloring it would be difficult to 
spot by eye the EdS SNe (the challenged reader could try it 
on a B&W printer). See Section IV A 1 for more details. 



significantly different (i.e. same precision), but their po- 
sition has substantially moved by ~ 3er (i.e. much better 
accuracy) . 

We will now pretend that 2?EdS is made of real data 
and test the sensitivity of our method with it. Before 
proceeding, however, one can see that the dataset is sick 
by means of the standard goodness-of-fit test given by 
X 2 = X 2 /(Ntot — 3) (the cosmological model has three 
parameters). For X>EdS we find indeed x 2 — 1-39, that is, 
the catalogue is incompatible with the theoretical model 
at 6-cr level. Nevertheless we advocate that internal ro- 
bustness is a better test than the standard goodness-of- 
fit; in other words we would like to show that we can give 
a stronger exclusion. In order to do so, we will "normal- 
ize" 2?Eds by adding a constant of* t tra to the errors cr, 
such that x 2 — \. The idea is that the normalized cata- 
logue passes the goodness-of-fit test but not the internal 
robustness one. 

The technique of normalizing a catalogue to \ 2 — 1 is 
actually often used in supernova cosmology [38] . SNe are 
indeed imperfect standard candles with a residual scat- 
ter, called the intrinsic dispersion, of roughly cr int ~ 0.1 
magnitudes. As SN physics is not yet thoroughly un- 
derstood, (7i n t is not tightly constrained, and it is often 
determined by demanding that x 2 = I- 4 SNe catalogues 
are therefore perfect candidates for testing the internal 
robustness method. Since the Union2.1 data already in- 
clude implicitly a <7j„t, what we call of„ t tra is the amount 



4 Note that this procedure may hide problems with the theoretical 
model being used, as it is shown by the very example of X>EdS- 
For further discussion and alternatives see [39-41]. 
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Figure 3. 1, 2 and 3<r confidence-level contours for subsets d\ 
(blue) and d,2 (red), where d,2 is made of the 100 systematics- 
driven EdS SNe shown in red in Fig. 2. The contours are far 
apart and the robustness is indeed extremely low, R ~ —97.6. 
Also shown in green are the confidence-level contours for the 
complete dataset d to t- The 100 EdS SNe clearly bias the 
likelihood surface of the full dataset. See Section IV A 1 for 
more details. 



to be added to a mt in order to have % 2 = 1. For 2?EdS one 
needs of* t tra — 0.0356 magnitudes; that is, to increase the 
errors. As a consequence the contours of Fig. 3 become 
slightly broader and the minimum robustness is larger, 
even though still extremely low: i? m i n ~ —70.9. 

Finally, we computed the internal robustness for the 
(normalized) 2?EdS dataset using the set of partitions 
3(T) with a statistics of T = 5 • 10 5 subsets (see Sec- 
tion III). As explained in Section II C, in order to deter- 
mine if the iR-PDF of 2?EdS passes or not the robustness 
test, we have to generate a distribution of iR-PDFs, which 
we obtain by computing the internal robustness for 100 
mocks A4j. Each mock iR-PDF is generated using the 
same set prescription H(T) but with a lower statistics of 
T = 5 TO 4 , as the fluctuations among the mocks are more 
important than the "sampling" fluctuations due to pois- 
sonian errors. The mock catalogues M.j have been also 
normalized to x 2 — 1- It is worth saying at this point 
that the robustness test is quite expensive from a com- 
putational point of view. The numerical results of this 
paper have been obtained with Wolfram Mathematica 
8 and the average CPU time to calculate the robustness 
value of a given partition was ~ 2 — 3 seconds (luckily the 
computation is easily parallelized) . The size of T and the 
number of mocks used are therefore the main constraints 
in the final results: the higher the statistics, the clearer 
the signal one may get. 
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Figure 4. [Left]: binned internal-robustness PDF (orange curve) for the dataset ©Eds of Fig. 2 against cr-bands (gray areas) 
from (unbiased) mock catalogues. [Right]: same as the left panel for larger bins (dashed lines) and lower robustness values. 
Moreover, the bin height values hk have been translated and scaled according to hk — > (ht, — hk)/<Jh k in order to uniformly 
show the signal across the various bins. One can clearly see that while the body of the iR-PDF is compatible with the mocks, 
the low-robustness tail is detected as being driven by systematics. The £>EdS datum relative to the bin (—20, —6] lies at 4.2-a 
confidence levels. See Section IV A 2 for more details. 



2. Analysis of the iR-PDF 

The left panel of Fig. 4 shows the iR-PDF of X> E ds 
(solid orange line), which has been obtained by binning 
the robustness values within JV& bins of widths ARk. The 
same binning prescription has been used to calculate also 
the bin heights hkj of the unbiased mocks Aij, which 
have been used to compute mean hk and standard de- 
viation (Tftj. for the Nt, bins. By assuming that the bin 
heights within a given bin are distributed according to 
a gaussian PDF with mean hk and the standard devia- 
tion (Jh k , we have then drawn the cr-bands bounding a 
systematics-free iR-PDF (gray areas in the left panel of 
Fig. 4). From this plot it seems as if the iR-PDF of 2?EdS 
passes the internal robustness test. 

One expects, however, the signal to be concentrated in 
the low-robustness tail of the PDF, in which systematics- 
driven SNe should dominate one of the two partitions 
di j2 - The spurious iR-PDF can indeed be loosely thought 
as the sum of a systematics-free PDF and a systematics- 
driven perturbation. The systematics-free PDF will be- 
have according to the cr-bands, which do not change siz- 
ably if the biased subset is sufficiently small. The per- 
turbation will then be given by the 100 EdS SNe which, 
when e?2 coincides with them, will add a single point to 
the histogram at i? m i n ~ —70.9. This point is clearly 
not observable in practice as one cannot scan all the 
10 174 possible subsets. At the expense of a higher ro- 
bustness, however, there will be many subsets g?2 with a 
fraction of the 100 EdS SNe, thus generating a stronger 
low-robustness tail in a biased dataset. 

In order to analyze the low-robustness tail of the PDF, 
we have to drop the gaussian assumption. The latter is 
indeed a good approximation only for the bins in the 



body for which hk/<Jh k 3> E but not in the tail as il- 
lustrated in Fig. 5 where the distributions for two low- 
robustness bins are shown. The reason is that the iR- 
PDFs of the mocks have a compact support which have a 
variable i? m i n - At a given low value R in the tail, there- 
fore, some of the robustness distributions of the mocks 
will be very close (if not identical) to zero, with the con- 
sequence that the distribution of the bin heights will be 
skewed, thus deviating from gaussianity. Note that this 
cannot be cured by using a higher statistics T as this 
feature is related to the existence of an i? m i n for the iR- 
PDF. 

In order to analyze the tail, the standard way to pro- 
ceed would be to generate many mock iR-PDFs so as to 
compute the non-gaussian distribution in the bin heights 
numerically. As remarked earlier, however, it is numer- 
ically expensive to obtain an iR-PDF and we limit our 
sample of mock catalogues to 100. In order to properly 
quantify the signal we have then proceeded in two ways. 
The first and most obvious is to check if any mock has a 
bin height larger than 2?Eds- As shown by Fig. 5 this is 
not the case for the lowest bin (where we expect the sig- 
nal to be strongest) , and we can so conclude that 2?EdS is 
systematics-driven at a confidence level better than 99%, 
or 2.6cr. 

The second way is to use a template to be fitted to 
the data, and then use the fitted template to assess the 
significance of the datum relative to 2?EdS- We used as 
template the Pearson distribution [42, 43], which is found 
by demanding that its first four moments coincide with 
the moments from the bin heights. The result is depicted 
by a solid black in Fig. 5, where also the gaussian dis- 
tribution is plotted for comparison. As one can see the 
fitted Pearson distribution correctly reproduces the skew- 
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Figure 5. Binned distributions of bin heights for the lowest (left panel) and next-to-lowest (right panel) robustness bins of the 
mock robustness distributions used to make the <r-bands shown on the right panel of Fig. 4. The bin height corresponding to 
©Eds is shown as an orange vertical line, and corresponds to the orange curve on the right panel of Fig. 4. The dotted line is 
a fitted gaussian distribution. The solid line is the fitted Pearson distribution which has been used to draw the cr-bands shown 
on the right panel of Fig. 4. See Section IV A 2 for more details. 



ness of the data, and in particular does not go to negative 
bin height, as instead does the gaussian distribution for 
the lowest bin. 

Having calculated the er-levels with the Pearson tem- 
plate, we can now show our final results in the right panel 
of Fig. 4. We have used larger bins as compared to the 
plot in the left panel because the bins extend to lower 
robustness values, which have lower statistics. Also, the 
bin height values hk have been translated and scaled ac- 
cording to: 



0.0020 F 



hk 



(32) 



so as to uniformly show the signal across the various 
bins. Fig. 4 clearly shows how the body of the iR-PDF of 
22 Eds passes the robustness test, as opposed to the low- 
robustness tail which is detected as being systematics- 
driven. More precisely, the 2?EdS datum relative to the 
bin (—20, —6] lies at 4.2-a confidence levels, while the da- 
tum relative to the robustness bin (—6, —4] lies at 2.8-a 
confidence levels. Note that in the previous results we 
did not include the error in the datum relative to 2?Eds- 
The latter is indeed negligible as the iR-PDF of 2?EdS 
has been found with a statistics much higher than the 
one relative to the iR-PDF of the mocks. 

Finally, one would expect non-negligible fluctuations in 
the third (skewness) and forth (kurtosis) moments com- 
ing from a sample of only 100 data, and in order to assess 
the uncertainty on the exclusion signal, one may proceed 
as follows. Repeat enough times: a) generate 100 ran- 
dom values from the fitted Pearson PDF; b) fit again 
the Pearson PDF to this new data and calculate the a 
confidence level. If we now apply this routine to the two 
lowest bins of Fig. 5 for which the signal is 4.2a and 2.8a, 
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Figure 6. Correlated distribution of bin heights for the two 
lowest-robustness bins of Fig. 5. The 1, 2 and 3<r confidence- 
level contours have been calculated using the gaussian values 
A\ 2 = 2.30, 6.17, 11.8. The bin height corresponding to 2?EdS 
is shown as an orange disk. See Section IV A3 for details. 



respectively, we find that the signal is > 3.3cr and > 2.3a 
at 95% confidence level, respectively. 



3. Taking correlations into account 

When interpreting the results in Fig. 4 one cannot ne- 
glect the correlations among the bins ARk- In other 
words, to get a reliable estimate by eye one is supposed 
to consider only one bin at a time (namely the bin with 
the strongest signal) and is not allowed to combine the 
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Figure 7. Binned iR-PDF using for the robustness its frequen- 
tist limit — 2i?f rcq = Xtot ~ Xi ~ £!• Distributions relative to 
two unbiased mock catalogues are shown in red and blue, 
while the binned \ 2 distribution with 3 d.o.f. is shown with 
a black curve. The error bars stand for 3<r. It is clear that 
the x 2 distribution with 3 d.o.f. is not the correct PDF even 
though it captures the overall shape. See Section IV A 4 for 
more details. 



signals from various bins as (all) the bins are most likely 
correlated. The same procedure can also be followed nu- 
merically to get the simplest conservative estimate (see 
previous Section). 

As the signal is concentrated in the low-robustness tail, 
where the gaussian assumption is not a good approxima- 
tion, we compute the correlation by building a histogram. 
In Fig. 6 we show the corresponding contour plot for the 
two lowest-robustness bins together with the correspond- 
ing datum of 2?Eds (orange disk). As with only 100 mocks 
it is not possible to obtain a good enough PDF which can 
then be integrated to calculate the contours, in Fig. 6 
the contours are drawn using gaussian values, to wit: 
Ax 2 = 2.30, 6.17, 11.8. From this analysis it looks as if 
taking correlations into consideration would increase the 
signal. 



4- Frequentist limit 

If one neglects the logarithmic part in Eq. (30), then 
the robustness for the unbiased mocks becomes the pa- 
rameter goodness-of-fit test [27, 28] which is distributed 
as a x 2 distribution with 3 degrees of freedom. As ex- 
plained in Section II C, however, this is exactly true only 
for the external robustness, but not for the internal ro- 
bustness. Nevertheless, the x 2 distribution with 3 d.o.f. 
captures the overall behavior of a fiducial iR-PDF, as can 
be seen in Fig. 7 where is plotted the quantity 

•Rfroq = ~\ (Xtot ~ Xx ~ xt) (33) 
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Figure 8. Plotted is the distance modulus of the SNe that 
passed the quality cuts (blue) and made it into the final Union 
2.1 catalogue, and the distance modulus of the SNe that did 
not (red). See Section IV B for more details. 



for two mock catalogues (red and blue) together with the 
binned x 2 distribution with 3 d.o.f. (black). Note that 
when the best fits of di and d 2 coincide one has i?f roq = 0. 

B. Union2.1 dataset with previously-excluded SNe 

To further test our systematic search for systematic 
biases, it would be interesting to apply the internal ro- 
bustness method to a dataset for which one indeed ex- 
pects a priori a significant amount of contamination due 
to systematics. Luckily one such sample is readily avail- 
able. The Union2.1 catalogue was in fact constructed by 
enforcing group quality criteria to their full supernova 
set of 753 elements, which resulted in the removal of 173 
SNIa. The criteria were [8]: 

1. that the CMB-centric redshift is greater than 0.015; 

2. that there is at least one point between —15 and 6 
restframe days from B-band maximum light; 

3. that there are at least five valid data points; 

4. that the entire 68% confidence interval for the 
SALT2 parameter x\ lies between —5 and +5; 

5. data from at least 2 bands with rest-frame central 
wavelength coverage between 2900 A and 7000 A; 

6. at least one band redder than rest-frame U-band. 

Now, these 173 SNIa are precisely ones for which sys- 
tematics could be a dominant factor. However, for 38 
of these the lightcurve fitter algorithm did not converge, 
so we do not have a measurement of their distance mod- 
uli. We thus analyzed the Union2.1 catalogue augmented 
with 135 supernovae that did not pass their quality cuts. 
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Figure 9. Same as Fig. 4 for the dataset T>v+ of Fig. 8 (green curve). One can clearly see that in this case the (strong) signal 
of 2?u+ as being systematics-driven comes from the bins in the body of the PDF. See Section IV B for more details. 
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Figure 10. Same as Fig. 5 for the dataset T>u + . The bin height corresponding to T>tj + is shown as a green vertical line, and 
corresponds to the green curve in Fig. 9. These two robustness bins were chosen for this plot as they quite strongly show 
that T>u+ is systematics driven. More precisely, the £>u+ datum relative to the bin (3.75,4] lies at 5.4-cr confidence levels. See 
Section IV B for more details. 



We show these excluded supernovae in Fig. 8. 
refer to this dataset of 715 SNe as £>u+- 



We will 



As before, we have normalized both 2?u+ an d relative 
mocks to x 2 — 1- The dataset 2?tj+ nas indeed a very 
high x 2 = 1-72 which, in order to go to unity, needs a 
quite large added error of of * t tra = 0.070 magnitudes. We 
show the results in Fig. 9, which has been obtained using 
the Pearson distribution (for both left and right panels) 
as explained in Section IV A 2. As compared to the anal- 
ysis of 7_?EdS of Fig. 4, we have used a finer binning in 
the robustness. Now indeed the signal is in the body of 
the iR-PDF and not in the tail. The reason for this dif- 
ferent behavior is that generally, by decreasing the value 
of x 2 , one makes the iR-PDF more peaked and with a 



shorter low-robustness tail. Therefore, in order to put 
the large x 2 °f ^u+ to unity, the latter catalogue has 
been "over normalized", probably because the high value 
of x 2 = 1-72 was due to just a bunch of very biased super- 
novae. From the results of Fig. 9 we can claim a detection 
at more than 3a of the catalogue as being systematics- 
driven, which is a completely independent check that the 
SNe that did not pass the quality cuts were indeed dom- 
inated by systematic effects. 

We show in Fig. 10 the distributions of the bin heights 
for two robustness bins that have a strong exclusion sig- 
nal. We have plotted both the fitted Pearson distribu- 
tion used in Fig. 9 and the fitted gaussian distribution for 
comparison. The 2?tj+ datum relative to the bin (3.75,4] 
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Figure 11. Same as Fig. 6 for the two robustness bins of 
Fig. 10. The bin height corresponding to T>u+ is shown as a 
green disk. See Section IV B for more details. 



lies at 5.4-cr confidence levels. By estimating the uncer- 
tainty on the exclusion signal as explained at the end of 
Section IV A 2, we find that the signal is > 3.8a at 95% 
confidence level. 

Finally in Fig. 11 we show the contour plot of the 
correlated binned distribution of bin heights for the two 
robustness bins of Fig. 10. As before, we show this fig- 
ure to illustrate the correlation between bins with strong 
signal, but we do not use it to assess the statistical prop- 
erties of X>u+ as we only have a limited sample of mock 
catalogues. 

C. Union2.1 dataset 

Now that we have tested the sensitivity of the inter- 
nal robustness method, we will show the results for the 
actual Union2.1 catalogue, to which we will refer as T>\j. 
Fig. 12 shows that we did not find any sign of system- 
atic effects in the Union2.1 compilation. As Union2.1 
is constructed from a collection of different instruments, 
it is a valid concern whether systematics are properly 
accounted for. For instance, individual estimations for 
the Malmquist bias in each of the supernova sub-samples 
were not made [3] and in general all systematics were 
treated by adding nuisance parameters [8]. Moreover, 
their data in the compact form here used, which pro- 
vides directly the distance moduli, assumes a particular 
cosmological model (the SALT2 parameters a and (3 are 
fixed in the best fits values given by such model) and is 
thus not technically independent data, as instead we im- 
plicitly assumed here. Our results arc nevertheless clear, 
and serves as a cross-check on this compilation. This is 
one of the main results of this work. 

As in the previous Sections, we show in Fig. 13 the dis- 
tributions of the bin heights for two robustness bins that 
have the strongest signal. Again, we have plotted both 



the fitted Pearson distribution used in Fig. 12 and the 
fitted gaussian distribution for comparison. Note that, 
similarly to Fig. 4, in the left panel of Fig. 12 the gaus- 
sian distribution has been used. The latter is indeed a 
good template PDF for the robustness bins in the body 
with values < R < 6. In Fig. 14 we show the contour 
plot of the correlated binned distribution of bin heights 
for the latter two robustness bins. As before, we show 
this figure to illustrate the correlation between bins with 
strong signal. 



V. CONCLUSIONS 

In this paper we proposed a new Bayesian statistical 
method, dubbed Internal Robustness, to detect the pres- 
ence of systematics with an automated procedure. While 
a clear physical understanding of all effects contributing 
to any cosmological observation is our ultimate goal, we 
are still far from that level and must thus resort to statis- 
tical inference to guide us. Once identified, systematic- 
contaminated data can be further analyzed and eventu- 
ally corrected (or at least excluded). Fig. 3 nicely sum- 
marizes the ultimate goal of internal robustness, that is, 
to go from the original set in green to the "decontami- 
nated" set in blue. The contours do not look significantly 
different (i.e. same precision), but their position has sub- 
stantially moved of ~ 3tr (i.e. much better accuracy). 

In order to test this method we applied it to three su- 
pernova catalogues, all based on the Union2.1 dataset: 
the artificial £>EdS (with 100 mock supernovae drawn 
from the "wrong" EdS model); the dataset £>u+ which 
includes supernovae that were previously excluded (be- 
cause they were suspected of being dominated by sys- 
tematics); and the "normal" catalogue 2?tj (the stan- 
dard Union2.1). The latter, nevertheless, is constructed 
from an assortment of different instruments, both ground 
and space based, and it is not completely clear a priori 
whether it would exhibit signs of (leftover) systematic 
effects. 

Our results clearly show no evidence for systematic 
contamination in the standard Union2.1 data. This is an 
important result in itself and it serves as a cross-check 
on this very heterogeneous SNe compilation. For the 
other two catalogues, the results are also telling. Us- 
ing a Pearson template to fit the distribution of mock 
data, we find robustly that the 2?Eds an d 2?u+ catalogues 
are systematics-driven at over 3.3-cr and 3.8-c confidence 
level, respectively. 

In the present paper we have focused on assessing if 
a given dataset is dominated by systematics or not. If a 
catalogue does not pass the internal robustness test, then 
the next step would be to find the subsets of data that are 
most probably contaminated. To accomplish the latter 
one needs an efficient algorithm to find partitions with 
values of internal robustness as low as possible. Once the 
contaminated data are found they could be used to help 
test some suggested hypothesis, such as the existence of 
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Figure 12. Same as Fig. 4 for the dataset T>u (pink curve). These plots show that we did not find any sign of systematic effects 
in the Union 2.1 catalogue. See Section IV C for more details. 
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Figure 13. Same as Fig. 5 for the dataset Z?u- The bin height corresponding to T>\j is shown as a pink vertical line, and 
corresponds to the pink curve shown on the right panel of Fig. 12. These two robustness bins were chosen for this plot as they 
have the strongest (even though still weak) signal. See Section IV C for more details. 



two distinct supernova populations, the evolution of the 
supernova intrinsic luminosity with redshift and/or host 
metallicity et cetera. We will address this issue in forth- 
coming work. 
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Figure 14. Same as Fig. 6 for the two robustness bins of 
Fig. 13. The bin height corresponding to 2?u is shown as a 
pink disk. See Section IV C for more details. 



Appendix A: Technical Details 

While scanning supernova subsets and building the ro- 
bustness histogram, one has to carefully analyze small 
subsets. These small subsets produce indeed contours 
which are not only very broad but also have best-fit val- 
ues very far from typical ones obtained with the full cata- 
log. In the {f2 m o> ^ao} parameter space this entails some 
technical problems. 

One issue has to do with "no Big-Bang" scenarios. For 
any positive value of £l m o there exists a positive value of 
S1™q x for which the Hubble function H (z) goes to zero at 
a finite rcdshift. This fl^ K (fl m0 ) line is usually referred 
to as the no-Big-Bang line (or "loitering" line), values 
of £l\o above which are excluded. The same issue arises 
for any negative value of f2 TO o, although this is usually 
correctly perceived as an unimportant technicality as a 
negative fl m o is physically excluded a priori. Here, on 
the other hand, allowing negative il m o is reasonable as 
it will be a hint that systematics are involved (we may 
be misinterpreting a systematic parameter for £l m o - see 
Section II D). Likewise, a likelihood that embraces values 
of f^Ao > ^™o x ma y & l so b e a nm t °f hidden systematics. 

In order to allow as much freedom as possible to the 
likelihood contours, we adopted an "extended" no-Big- 
Bang line: we allow values of {f2 m o,^Ao} that do not 
exhibit a singularity in H(z) for z < 2. This encom- 
passes all supernovae measured so far and opens up a 
large region for which f2 m o < 0; see Fig. 15. 

To find the line that delimits such a region one com- 



putes first the solutions f2™Q X (z, Q m o) for which H(z) — 
0. The minimum of £!™q x (z, f2 m o) with respect to z 
gives the lower redshift for which H(z) = for a 
given f2 m o- The function we are looking for is then 
ri A 1 Q X (z*(fi m o), O m o) where z*(Q m o) is either the redshift 
corresponding to the minimum z m (f2 m o) or 2, whichever 
is lower. In other words, z* = Min(z m ,2). Carrying out 
this calculation one gets 



Ouiax/O . J g (1 4" 2f2 m o) j 
AO ("roOj — \ QnoBBfo ^ 
AO l J 'm0ji 



n m0 < 1/10: 
fl m0 > 1/10: 



(Al) 



where ^ A q BB is the traditional no Big Bang function: 
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+ (x 2 + 2(l + y)-2x(2 + y)) 2/3 ) 

where in turn x = f2 m o an( i V = VI — 2x. 



y)) 



2/3 




Figure 15. Shown is the excluded parameter-space region 
(green shaded area) for which the expansion rate is negative 
for z < 2. This ensures that the distance (proportional to 
J dz/H(z)) to a source at redshift z < 2 is not singular. This 
is a relaxation of the usual no-big-bang region (left of the red 
dashed line). See Appendix A for more details. 
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